GPU 的誕生是一次根本性的轉變,其動力來自於 「即時性要求」:必須在 $1/60^{th}$ 秒(16.67毫秒)內完成複雜 3D 场景的渲染,這項要求不容妥協。儘管 CPU 遵循著 多核心發展路徑 以低延遲串列執行為優化目標,但隨著解析度提升,性能便遇到瓶頸。
1. 16.67 毫秒的限制
在 90 年代中期,遊戲產業面臨危機。單線程的 CPU 在處理人工智慧與物理運算時,無法快速計算出數百萬個像素值,以致畫面流暢度無法維持。這迫使業界開發專用硬體,以卸下重複性的 圖形管线。
2. 扫描線交錯技術(SLI)
在內部並行陣列出現之前,3dfx 引入了 掃描線交錯技術(SLI)。透過兩張實體顯示卡交替計算水平掃描線,產業重心從單一執行緒速度轉移到原始的「暴力運算」吞吐量。
3. 吞吐量對比延遲
GPU 的設計理念將矽晶面積優先分配給簡單的算術單元,而非複雜的分支預測。這種「寬而慢」的哲學,使 GPU 能夠處理三角形的重複數學運算,同時讓 CPU 傾向於處理非平行化的邏輯。
main.py
TERMINALbash — 80x24
> Ready. Click "Run" to execute.
>
QUESTION 1
What is the specific 'time budget' required for 60 frames per second (FPS)?
33.33ms
16.67ms
10.00ms
100.00ms
✅ Correct!
Correct! $1000ms / 60 = 16.67ms$. This is the window a GPU has to complete all calculations.❌ Incorrect
Think of 1 second (1000ms) divided by 60 frames.QUESTION 2
How did 3dfx's SLI achieve early parallelism in consumer hardware?
By increasing the clock speed of a single chip.
By having two cards render alternating horizontal scan lines.
By sharing AI logic between the GPU and CPU.
By reducing the resolution of the frame.
✅ Correct!
SLI (Scan Line Interleave) was the first major 'brute force' scaling method for graphics.❌ Incorrect
It didn't change speed or resolution; it distributed the workload across two physical devices.QUESTION 3
Why did the GPU diverge from the standard multicore trajectory of CPUs?
GPUs needed deeper caches for complex branching.
GPUs prioritize throughput of simple math over low-latency serial logic.
CPUs became too expensive to manufacture for 3D graphics.
GPU architectures were designed to be smaller than CPUs.
✅ Correct!
CPUs focus on minimizing latency for single tasks; GPUs focus on maximizing the total number of tasks completed simultaneously (throughput).❌ Incorrect
Review the 'wide and slow' philosophy vs the CPU's complex control logic.QUESTION 4
In the context of 1990s gaming, what was the 'Real-Time Imperative'?
The requirement to run physics simulations on the GPU.
Processing millions of pixels within the strict frame window.
The transition from 16-bit to 32-bit computing.
Allowing the CPU to handle rasterization.
✅ Correct!
Without a dedicated accelerator, a CPU could not finish drawing a frame before the display refreshed, causing lag.❌ Incorrect
It refers to the strict temporal deadline of display refresh rates.QUESTION 5
What is meant by the GPU's 'Wide and Slow' philosophy?
Using many simple processors at lower clock speeds to do massive work.
Designing physically wide chips that take longer to process data.
A design that favors high latency but high memory capacity.
Optimizing for single-threaded serial logic.
✅ Correct!
GPUs use thousands of small cores. While one core is slow, the collective throughput is massive.❌ Incorrect
It refers to parallel width (many cores), not physical dimensions or slow overall performance.Case Study: The Quake Revolution
The transition from Software to Hardware Rasterization
In 1996, 'Quake' was a technical marvel. Running it on a CPU (Software Rendering) meant the processor handled AI, sound, networking, and every pixel's math. The Voodoo1 changed this by introducing a dedicated graphics pipeline.
Q
1. Why did offloading the graphics pipeline significantly improve 'Quake's' frame rate even if the CPU wasn't upgraded?
Solution:
Offloading removed the 'dumb math' of pixel rasterization from the CPU, freeing its cycles to focus exclusively on game logic and physics while the specialized GPU hardware calculated pixel values in parallel.
Offloading removed the 'dumb math' of pixel rasterization from the CPU, freeing its cycles to focus exclusively on game logic and physics while the specialized GPU hardware calculated pixel values in parallel.
Q
2. How does the 'Real-Time Imperative' explain the visual stuttering observed on CPUs when resolution was increased?
Solution:
As resolution increases, the number of pixels grows. Since a CPU processes pixels serially, the total time required eventually exceeds the 16.67ms frame budget, causing the system to wait and drop frames.
As resolution increases, the number of pixels grows. Since a CPU processes pixels serially, the total time required eventually exceeds the 16.67ms frame budget, causing the system to wait and drop frames.